feat: add focused LLM slice files (llms-api.txt, llms-node-ops.txt)#802
feat: add focused LLM slice files (llms-api.txt, llms-node-ops.txt)#802crtahlin wants to merge 4 commits into
Conversation
Add two task-specific documentation bundles via docusaurus-plugin-llms customLLMFiles, so AI agents can load only the docs relevant to their task instead of the full 630KB llms-full.txt: - llms-api.txt (220KB, 20 docs): API usage, uploads, stamps, feeds, chunks, encryption, developer tooling - llms-node-ops.txt (225KB, 22 docs): installation, configuration, monitoring, staking, backups, upgrades, FAQ Both use glob patterns so new pages added under those directories are automatically included — no manual maintenance needed. Also adds slice file references to static/llms.txt for agent discovery, and extends the validation script to log referenced slice files. Refs: ethersphere/DevRel#840
✅ Deploy Preview for test-twitter-preview-testing-3 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
darkobas2
left a comment
There was a problem hiding this comment.
Nice idea — the slices are exactly what agents need to skip the 630KB monolith. A few things before merge:
1. CLAUDE.md looks like a personal config leaking into the upstream repo
- **Always use the `crtahlin` fork/repo** for creating issues, branches, and all GitHub operations — never the upstream `ethersphere` repo.This is committed to ethersphere/bee-docs (the upstream). If another contributor clones the repo and uses Claude Code, they'll get told to push to your fork. That's clearly not what you want for everyone else.
Two options:
- Keep
CLAUDE.mdhere but strip personal/workflow instructions — leave only the things that apply to every contributor (project overview, commands, architecture, conventions). - Move the personal bits to
~/.claude/CLAUDE.md(user-scoped, not committed) or to a.claude/file you keep in your fork only.
Same goes for **Never mention Claude** — that's a defensible project rule, but you might want to phrase it as "no AI-attribution noise in commits/issues" since CLAUDE.md itself is literally mentioning Claude in the repo.
2. Two includePatterns don't match any file — silently dropped
I checked the actual docs/develop/ tree against the patterns:
docs/develop/act.md— doesn't exist. The file you almost certainly want isdocs/develop/access-control.md(ACT == Access Control Trie).docs/develop/gateway-proxy.md— doesn't exist as a top-level develop file. There'sdocs/develop/tools-and-features/gateway-proxy.md(which is already listed) anddocs/develop/gateway.md(not listed — was that the intended addition?).
The plugin silently drops patterns that don't match, which is why your build still passes and the PR description's "20 docs" count comes out right — but you're losing access-control content.
3. Validation script doesn't catch the above
const sliceRe = /https:\/\/docs\.ethswarm\.org\/(llms-[a-z-]+\.txt)/g;This only confirms that names referenced in llms.txt look like slice file names — it doesn't verify any includePatterns actually resolve to files. A typo in customLLMFiles (like act.md) is invisible until someone diff's the generated slice contents.
Worth adding a check that resolves the globs and warns on patterns matching zero files. Same idea as --fail-on-glob-mismatch in other tooling. Otherwise this regresses silently next time someone refactors filenames.
Minor
descriptionandrootContentmostly duplicate each other; if both are emitted into the slice header that's fine, but worth confirming with a quickheadon the generated files.fullContent: trueon both — is there a use case for the trimmed-content variant for an even smaller slice? Not blocking, just curious about the size/value tradeoff.
Architecture (customLLMFiles + auto-include via globs + discovery links in llms.txt) is solid. Fix items 1 and 2, ideally 3, and this is good to go.
darkobas2
left a comment
There was a problem hiding this comment.
Nice cleanup — the slice idea is exactly what coding agents need. Three things to address before merge:
🔴 Two llms-api.txt paths don't exist in the repo
Cross-checked the includePatterns array against master:
| Pattern | Status |
|---|---|
docs/develop/act.md |
❌ does not exist (likely meant docs/develop/access-control.md) |
docs/develop/gateway-proxy.md |
❌ does not exist (the file is at docs/develop/tools-and-features/gateway-proxy.md, which is already listed — looks like an accidental duplicate at the wrong path) |
The array has 22 entries but the PR description / build log shows "20 documents" — i.e. these two paths matched nothing and were silently dropped. The slice is missing ACT (access control) entirely, which is exactly the kind of thing a developer agent will be asked about.
Fix: replace docs/develop/act.md with docs/develop/access-control.md, and drop the duplicate docs/develop/gateway-proxy.md line.
🟡 CLAUDE.md is personal config, not upstream config
Two rules in the added file are author-specific and don't belong in ethersphere/bee-docs:
Always use the
crtahlinfork/repo for creating issues, branches, and all GitHub operations — never the upstreamethersphererepo.
Never mention Claude in any commit messages, issue titles, issue bodies, branch names, or any other visible output. Do not reference AI assistance.
Both of these are workflow preferences for you working from your fork. If another contributor (or another agent) clones upstream and reads this CLAUDE.md, they'll be told to push to your fork — which is wrong. Suggest either:
- Drop the CLAUDE.md from this PR entirely and keep it as a local-only file (gitignored), or
- Keep an upstream CLAUDE.md but limit it to repo-neutral content (project overview, build commands, conventions) — strip the personal rules.
The "Project Overview / Architecture / Commands / Conventions" sections are useful and worth keeping if you split them.
🟡 validate-llms-txt.mjs change doesn't actually validate
The new block is labelled "Verify slice file references … point to files that will be generated" but the code just logs the references — it doesn't check the include patterns resolve to real markdown files. Had it actually globbed includePatterns against the filesystem, it would have caught the two missing paths above. Worth tightening, since the whole "auto-maintained via globs" story relies on these patterns not silently no-op-ing.
✅ Looks good
- Glob-based
docs/bee/installation/**forllms-node-ops.txt— all 11 install pages resolve. - All 12
working-with-bee/*paths inllms-node-ops.txtexist. static/llms.txtdiscovery links are clean.- Splitting a 630KB blob into 220KB / 225KB task-specific bundles is the right move for context budgets.
- Remove CLAUDE.md from repo (personal config) and add to .gitignore - Replace docs/develop/act.md with docs/develop/access-control.md - Remove duplicate docs/develop/gateway-proxy.md - Add includePatterns file-existence check to validate-llms-txt.mjs: globs checked via globSync, exact paths via readFileSync — would have caught both broken patterns before this review
|
Thanks for the thorough review — all three items addressed in 22ac0bd:
|
darkobas2
left a comment
There was a problem hiding this comment.
Re-reviewed. All three prior blockers are addressed:
CLAUDE.mdremoved from commit; now in.gitignore✓act.md→access-control.mdand the gateway-proxy path fixed; verified all 22 API + 11 node-ops paths +docs/bee/installation/**resolve on the head commit ✓- Validation script now walks
includePatternsand warns on missing files ✓
One small nit (won't block): the validator isn't wired into a package.json script or CI workflow, so the new pattern check won't actually run unless invoked manually. Worth a follow-up npm run validate-llms + a CI step.
LGTM, ready to merge.
Summary
Adds two task-specific documentation bundles so AI coding agents can load only what's relevant instead of the full 630KB
llms-full.txt:llms-api.txt(220KB, 20 docs) — API usage, uploads, stamps, feeds, chunks, encryption, developer toolingllms-node-ops.txt(225KB, 22 docs) — installation, configuration, monitoring, staking, backups, upgrades, FAQUses
docusaurus-plugin-llmscustomLLMFiles— glob patterns auto-include new pages, no manual maintenance.Also adds discovery links in
static/llms.txtand extends the validation script.Refs: ethersphere/DevRel#840
Maintenance
docs/develop/ordocs/bee/installation/are automatically includedscripts/validate-llms-txt.mjslogs referenced slice filesTest plan
npm run buildsucceeds — both files generatedllms-api.txt: 20 documents, correct header/rootContentllms-node-ops.txt: 22 documents, correct header/rootContentstatic/llms.txtreferences both slices